Goto

Collaborating Authors

 speech engine


Bringing Live Transcribe's Speech Engine to Everyone

#artificialintelligence

Earlier this year, Google launched Live Transcribe, an Android application that provides real-time automated captions for people who are deaf or hard of hearing. Through many months of user testing, we've learned that robustly delivering good captions for long-form conversations isn't so easy, and we want to make it easier for developers to build upon what we've learned. Live Transcribe's speech recognition is provided by Google's state-of-the-art Cloud Speech API, which under most conditions delivers pretty impressive transcript accuracy. However, relying on the cloud introduces several complications--most notably robustness to ever-changing network connections, data costs, and latency. Today, we are sharing our transcription engine with the world so that developers everywhere can build applications with robust transcription. Those who have worked with our Cloud Speech API know that sending infinitely long streams of audio is currently unsupported.


Google open-sources Live Transcribe's speech engine

#artificialintelligence

The company hopes doing so will let any developer deliver captions for long-form conversations. The source code is available now on GitHub. Google released Live Transcribe in February. The tool uses machine learning algorithms to turn audio into real-time captions. Unlike Android's upcoming Live Caption feature, Live Transcribe is a full-screen experience, uses your smartphone's microphone (or an external microphone), and relies on the Google Cloud Speech API.


Text to speech Python Tutorial

#artificialintelligence

We can make the computer speak with Python. Given a text string, it will speak the written words in the English language. This process is called Text To Speech (TTS). Pytsx is a cross-platform text-to-speech wrapper. It uses the Google Text to Speech (TTS) API.


What the Voice-Recognition Industry Needs Most

AITopics Original Links

I'm a big believer that voice-recognition technology will play an increasingly prominent role in how we interact with technology -- so much that I've made a bet on Nuance Communications (NAS: NUAN) accordingly as the clear technological leader in the field. So when the CEO of a small voice-recognition software company, Datria, reached out to me for a conversation, I jumped at the chance. Datria is a small private player with about 50 employees, and it resells Nuance's speech engine while also counting software giant SAP (NYS: SAP) as an investor. Jim Greenwell has been CEO for 12 years, and he provided valuable insight into the industry at large, as well as what trends users and investors should be on the lookout for. Who will rally the troops?